An Ensemble Classifier for Eukaryotic Protein Subcellular Location Prediction Using Gene Ontology Categories and Amino Acid Hydrophobicity
نویسندگان
چکیده
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.
منابع مشابه
Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملPrediction of protein subcellular locations using Markov chain models.
A novel method was introduced to predict protein subcellular locations from sequences. Using sequence data, this method achieved a prediction accuracy higher than previous methods based on the amino acid composition. For three subcellular locations in a prokaryotic organism, the overall prediction accuracy reached 89.1%. For eukaryotic proteins, prediction accuracies of 73.0% and 78.7% were att...
متن کاملPrediction of Protein Subcellular Multi-localization by Using a Min-Max Modular Support Vector Machine
Prediction of protein subcellular location is an important issue in computational biology because it provides important clues for characterization of protein function. Currently, much effort has been dedicated to developing automatic prediction tools. However, most of them focus on mono-locational proteins. It should be noted that many proteins bear multi-locational characteristics, and they ca...
متن کاملPrediction of Subcellular Localization of Apoptosis Proteins by Dipeptide Composition
By cluster analysis, all dipeptides are classified into 16 categories according to their hydrophobicity, Based on the composition of dipeptide categories, a novel representation of protein sequences is proposed here to predict the subcellular location of apoptosis protein sequences. Using K-Nearest Neighbor Classifier, and test on a known dataset which includes 317 apoptosis proteins , the high...
متن کاملMultiLoc2 and SherLoc2: improved prediction of subcellular protein localization
The function of a protein is highly correlated with its subcellular localization. However, determining the subcellular localization of a protein experimentally can be difficult and time-consuming. Computational methods for the prediction of subcellular locations of proteins from the sequence alone are an attractive alternative. MultiLoc2 [1] and SherLoc2 [3] both significantly extend and improv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012